{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "rzn3xV2ScXXN",
    "papermill": {
     "duration": 0.013538,
     "end_time": "2021-02-20T13:42:41.189745",
     "exception": false,
     "start_time": "2021-02-20T13:42:41.176207",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "# Analyzing RCT data with Precision Adjustment"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "StJRPl8Lcsdd"
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "import patsy\n",
    "import statsmodels.formula.api as smf\n",
    "import statsmodels.api as sm"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "U1aS4EabcXXr",
    "papermill": {
     "duration": 0.011347,
     "end_time": "2021-02-20T13:42:41.213041",
     "exception": false,
     "start_time": "2021-02-20T13:42:41.201694",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "## Data\n",
    "\n",
    "In this lab, we analyze the Pennsylvania re-employment bonus experiment, which was previously studied in \"Sequential testing of duration data: the case of the Pennsylvania ‘reemployment bonus’ experiment\" (Bilias, 2000), among others. These experiments were conducted in the 1980s by the U.S. Department of Labor to test the incentive effects of alternative compensation schemes for unemployment insurance (UI). In these experiments, UI claimants were randomly assigned either to a control group or one of six treatment groups. Here we focus on treatment group 4, but feel free to explore other treatment groups. In the control group the current rules of UI applied. Individuals in the treatment groups were offered a cash bonus if they found a job within some pre-specified period of time (qualification period), provided that the job was retained for a specified duration. The treatments differed in the level of the bonus, the length of the qualification period, and whether the bonus was declining over time in the qualification period; see http://qed.econ.queensu.ca/jae/2000-v15.6/bilias/readme.b.txt for further details on data.\n",
    "  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "2qKDD0hecXX4",
    "papermill": {
     "duration": 0.279482,
     "end_time": "2021-02-20T13:42:41.503829",
     "exception": false,
     "start_time": "2021-02-20T13:42:41.224347",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "data = pd.read_csv(\"https://raw.githubusercontent.com/VC2015/DMLonGitHub/master/penn_jae.dat\", sep='\\\\s+')\n",
    "n, p = data.shape\n",
    "data = data[data[\"tg\"].isin([0, 4])]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "CWoXcstjcXX6",
    "papermill": {
     "duration": 0.065266,
     "end_time": "2021-02-20T13:42:41.580972",
     "exception": false,
     "start_time": "2021-02-20T13:42:41.515706",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "data[\"T4\"] = np.where(data[\"tg\"] == 4, 1, 0)\n",
    "data[\"T4\"].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "5q2hZMxScXX8",
    "papermill": {
     "duration": 0.053285,
     "end_time": "2021-02-20T13:42:41.646704",
     "exception": false,
     "start_time": "2021-02-20T13:42:41.593419",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "data.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "bgCUMjFWcXYB",
    "papermill": {
     "duration": 0.012846,
     "end_time": "2021-02-20T13:42:41.672662",
     "exception": false,
     "start_time": "2021-02-20T13:42:41.659816",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "### Model\n",
    "To evaluate the impact of the treatments on unemployment duration, we consider the linear regression model:\n",
    "\n",
    "$$\n",
    "Y =  D \\beta_1 + W'\\beta_2 + \\varepsilon, \\quad E \\varepsilon (D,W')' = 0,\n",
    "$$\n",
    "\n",
    "where $Y$ is  the  log of duration of unemployment, $D$ is a treatment  indicator,  and $W$ is a set of controls including age group dummies, gender, race, number of dependents, quarter of the experiment, location within the state, existence of recall expectations, and type of occupation.   Here $\\beta_1$ is the ATE, assuming the RCT assumptions hold. Within an RCT, the projection coefficient $\\beta_1$ has\n",
    "the interpretation of the causal effect of the treatment on\n",
    "the average outcome. We thus refer to $\\beta_1$ as the average\n",
    "treatment effect (ATE). Note that the covariates, here are\n",
    "independent of the treatment $D$, so we can identify $\\beta_1$ by\n",
    "just linear regression of $Y$ on $D$, without adding covariates.\n",
    "However, we do add covariates in an effort to improve the\n",
    "precision of our estimates of the average treatment effect.\n",
    "\n",
    "\n",
    "We also consider an interactive regression model:\n",
    "\n",
    "$$\n",
    "Y =  D \\alpha_1 + D W' \\alpha_2 + W'\\beta_2 + \\varepsilon, \\quad E \\varepsilon (D,W', DW')' = 0,\n",
    "$$\n",
    "where $W$'s are demeaned (apart from the intercept), so that $\\alpha_1$ is the ATE, assuming the RCT assumptions hold. Unlike the model without interactions, we are guaranteed to improve precision by considering the interactive model."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "XXlBcN0ucXYC",
    "papermill": {
     "duration": 0.012728,
     "end_time": "2021-02-20T13:42:41.724182",
     "exception": false,
     "start_time": "2021-02-20T13:42:41.711454",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "### Analysis\n",
    "\n",
    "We consider\n",
    "\n",
    "*  classical 2-sample approach, no adjustment (CL)\n",
    "*  classical linear regression adjustment (CRA)\n",
    "*  interactive regression adjusment (IRA)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "pWqd1FIJcXYD",
    "papermill": {
     "duration": 0.01276,
     "end_time": "2021-02-20T13:42:41.749736",
     "exception": false,
     "start_time": "2021-02-20T13:42:41.736976",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "# Carry out covariate balance check\n",
    "\n",
    "\n",
    "We first look at the coefficients individually with a $t$-test, and then we adjust the $p$-values to control for family-wise error."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "A3GrR6ZlcXYE",
    "papermill": {
     "duration": 0.013486,
     "end_time": "2021-02-20T13:42:41.776684",
     "exception": false,
     "start_time": "2021-02-20T13:42:41.763198",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "The regression below is done using \"type='HC1'\" which computes the correct Eicker-Huber-White standard errors, instead of the classical non-robust formula based on homoscedasticity."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "Gh0JcsYyk0tb"
   },
   "outputs": [],
   "source": [
    "formula = (\"0 ~ \"\n",
    "           \"(female + black + othrace + C(dep) + q2 + q3 + q4 + q5 + q6 \"\n",
    "           \"+ agelt35 + agegt54 + durable + lusd + husd)**2\")\n",
    "X = patsy.dmatrix(formula, data=data, return_type='dataframe')\n",
    "X.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "BTnrfPdaoMj4"
   },
   "source": [
    "There is strong multicollinearity in the design matrix. In order to address this, we drop columns by thresholding using the QR decomposition."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "K76SXqZ6lTlx"
   },
   "outputs": [],
   "source": [
    "def qr_decomposition(x):\n",
    "    # Get a set of columns with no linear dependence for smallest subset of observations\n",
    "    Q, Rx = np.linalg.qr(x)\n",
    "    ex = np.abs(np.diag(Rx))\n",
    "    keep = np.where(ex > 1e-6)[0]\n",
    "    xS = x.iloc[:, keep]\n",
    "\n",
    "    return xS\n",
    "\n",
    "\n",
    "xS = qr_decomposition(X)\n",
    "xS.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "YXHvx7VYYHE2"
   },
   "outputs": [],
   "source": [
    "# individual t-tests\n",
    "covariate_balance = sm.OLS(data[\"T4\"], xS)\n",
    "cb_fit = covariate_balance.fit(cov_type=\"HC1\", use_t=True)\n",
    "cb_fit.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "GOgLqzjmM85J"
   },
   "source": [
    "To test balance conditions, we employ the Holm-Bonferroni step-down method. With 100+ hypotheses, the family-wise type I error, or the probability of making at least one type I error treating all hypotheses independently, is close to 1. To control for this, we adjust p-values with the following procedure.\n",
    "\n",
    "First, set $\\alpha=0.05$ and denote the list of $n$ p-values from the regression with the vector $p$.\n",
    "\n",
    "1. Sort $p$ from smallest to largest, so $p_{(1)} \\leq p_{(2)} \\leq \\cdots \\leq p_{(n)}$. Denote the corresponding hypothesis for $p_{(i)}$ as $H_{(i)}$.\n",
    "2. For $i=1,\\ldots, n$,\n",
    "- If $$p_{(i)} > \\frac{\\alpha}{n-i+1} $$ Break the loop and do not reject any $H_{(j)}$ for $j \\geq i$.\n",
    "- Else reject $H_{(i)}$ if $$p_{(i)} \\leq \\frac{\\alpha}{n-i+1} $$ Increment $i := i+1$.\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "slILLJ98QUib"
   },
   "outputs": [],
   "source": [
    "def holm_bonferroni(p, alpha=0.05):\n",
    "\n",
    "    n = len(p)\n",
    "    sig_beta = []\n",
    "\n",
    "    for i in range(n):\n",
    "        if np.sort(p)[i] > alpha / (n - i):\n",
    "            break\n",
    "        else:\n",
    "            sig_beta.append(np.argsort(p).iloc[i])\n",
    "            i += 1\n",
    "\n",
    "    return sig_beta\n",
    "\n",
    "\n",
    "print(\"Significant Coefficients (Indices): \", holm_bonferroni(cb_fit.pvalues, alpha=0.05))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "0AC_UNXGcXYF",
    "papermill": {
     "duration": 0.02549,
     "end_time": "2021-02-20T13:42:42.269256",
     "exception": false,
     "start_time": "2021-02-20T13:42:42.243766",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "We see that that even though this is a randomized experiment, balance conditions are failed.\n",
    "<!--\n",
    "The holm method fails to reject any hypothesis. That is, we fail to reject the hypothesis that any coefficient is zero. Thus, in this randomized experiment, balance conditions are met. -->"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "AlZe_Dr6cXYH",
    "papermill": {
     "duration": 0.021046,
     "end_time": "2021-02-20T13:42:42.315674",
     "exception": false,
     "start_time": "2021-02-20T13:42:42.294628",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "# Model Specification"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "CQPB5YfCcXYH",
    "papermill": {
     "duration": 0.350081,
     "end_time": "2021-02-20T13:42:42.680497",
     "exception": false,
     "start_time": "2021-02-20T13:42:42.330416",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# no adjustment (2-sample approach)\n",
    "cl = smf.ols(\"np.log(inuidur1) ~ T4\", data)\n",
    "\n",
    "# adding controls\n",
    "cra = smf.ols(\"np.log(inuidur1) ~ T4 + \"\n",
    "              \"(female+black+othrace+C(dep)+q2+q3+q4+q5+q6+agelt35+agegt54+durable+lusd+husd)**2\",\n",
    "              data)\n",
    "# Omitted dummies: q1, nondurable, muld\n",
    "\n",
    "cl_results = cl.fit(cov_type=\"HC0\")\n",
    "cl_est = cl_results.params['T4']\n",
    "cl_se = cl_results.HC0_se['T4']\n",
    "print(f\"ATE: {cl_est:.4f}, (std={cl_se:.4f})\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "EQl_YGSLNWAi"
   },
   "outputs": [],
   "source": [
    "cra_results = cra.fit(cov_type=\"HC0\")\n",
    "cra_est = cra_results.params['T4']\n",
    "cra_se = cra_results.HC0_se['T4']\n",
    "print(f\"ATE: {cra_est:.4f}, (std={cra_se:.4f})\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "R-G1Vqo7cXYI",
    "papermill": {
     "duration": 0.02768,
     "end_time": "2021-02-20T13:42:42.735966",
     "exception": false,
     "start_time": "2021-02-20T13:42:42.708286",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "The interactive specification corresponds to the approach introduced in Lin (2013)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "SWBol4-GcXYJ",
    "papermill": {
     "duration": 0.898452,
     "end_time": "2021-02-20T13:42:43.662684",
     "exception": false,
     "start_time": "2021-02-20T13:42:42.764232",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# interactive regression model\n",
    "\n",
    "ira_formula = \"0 + (female+black+othrace+C(dep)+q2+q3+q4+q5+q6+agelt35+agegt54+durable+lusd+husd)**2\"\n",
    "X = patsy.dmatrix(ira_formula, data, return_type='dataframe')\n",
    "X.columns = [f'x{t}' for t in range(X.shape[1])]  # clean column names\n",
    "X = (X - X.mean(axis=0))  # demean all control covariates\n",
    "\n",
    "# construct interactions of treatment and (de-meaned covariates, 1)\n",
    "ira_formula = \"T4 * (\" + \"+\".join(X.columns) + \")\"\n",
    "X['T4'] = data['T4']\n",
    "X = patsy.dmatrix(ira_formula, X, return_type='dataframe')\n",
    "ira = sm.OLS(np.log(data[[\"inuidur1\"]]), X)\n",
    "ira_results = ira.fit(cov_type=\"HC0\")\n",
    "ira_est = ira_results.params['T4']\n",
    "\n",
    "# because we de-meaned the covariates and interacted them with the treamtent\n",
    "# we need to account for the error in the means of X, which has a first-order\n",
    "# effect on the ATE estimate\n",
    "interaction_cols = list(map(lambda x: x.startswith('T4:'), list(X.columns)))\n",
    "cate = X.iloc[:, interaction_cols] @ ira_results.params[interaction_cols]\n",
    "ira_se = np.sqrt(ira_results.HC0_se['T4']**2 + np.var(cate) / X.shape[0])\n",
    "\n",
    "print(f\"ATE: {ira_est:.4f}, (std={ira_se:.4f})\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Ovx__MnpcXYK",
    "papermill": {
     "duration": 0.030243,
     "end_time": "2021-02-20T13:42:43.724231",
     "exception": false,
     "start_time": "2021-02-20T13:42:43.693988",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "# Partialling Out with the Lasso\n",
    "Next we try out partialling out with theoretically justified lasso. We use \"plug-in\" tuning with a theoretically valid choice of penalty $\\lambda = 2 \\cdot c \\hat{\\sigma} \\sqrt{n} \\Phi^{-1}(1-\\alpha/2p)$, where $c>1$ and $1-\\alpha$ is a confidence level, and $\\Phi^{-1}$ denotes the quantile function. See book for more details.\n",
    "\n",
    "We pull an analogue of R's rlasso. We find a Python code that tries to replicate the main function of hdm r-package. It was made by [Max Huppertz](https://maxhuppertz.github.io/code/). His library is this [repository](https://github.com/maxhuppertz/hdmpy). If not using colab, download its repository and copy this folder to your site-packages folder. In my case it is located here ***C:\\Python\\Python38\\Lib\\site-packages*** . We need to install this package ***pip install multiprocess***."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "SRp09vViOZLR"
   },
   "outputs": [],
   "source": [
    "!pip install multiprocess\n",
    "!pip install pyreadr\n",
    "!git clone https://github.com/maxhuppertz/hdmpy.git"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "zNDvJ4cAOa28"
   },
   "outputs": [],
   "source": [
    "import sys\n",
    "sys.path.insert(1, \"./hdmpy\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "RRmXUWRmObiM"
   },
   "outputs": [],
   "source": [
    "# We wrap the package so that it has the familiar sklearn API\n",
    "import hdmpy\n",
    "from sklearn.base import BaseEstimator\n",
    "\n",
    "\n",
    "class RLasso(BaseEstimator):\n",
    "\n",
    "    def __init__(self, *, post=True):\n",
    "        self.post = post\n",
    "\n",
    "    def fit(self, X, y):\n",
    "        self.rlasso_ = hdmpy.rlasso(X, y, post=self.post)\n",
    "        return self\n",
    "\n",
    "    def predict(self, X):\n",
    "        return np.array(X) @ np.array(self.rlasso_.est['beta']).flatten() + np.array(self.rlasso_.est['intercept'])\n",
    "\n",
    "    @property\n",
    "    def coef_(self,):\n",
    "        return np.array(self.rlasso_.est['beta']).flatten()\n",
    "\n",
    "\n",
    "def lasso_model():\n",
    "    return RLasso(post=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Non-Interactive Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y = np.log(data[\"inuidur1\"])\n",
    "D = data['T4']\n",
    "ira_formula = \"0 + (female+black+othrace+C(dep)+q2+q3+q4+q5+q6+agelt35+agegt54+durable+lusd+husd)**2\"\n",
    "X = patsy.dmatrix(ira_formula, data, return_type='dataframe')\n",
    "X = X.loc[:, X.std() > 0]  # drop constant columns\n",
    "\n",
    "Dres = D - np.mean(D)\n",
    "yres = y - lasso_model().fit(X, y).predict(X)\n",
    "\n",
    "cra_lasso_results = sm.OLS(yres, Dres).fit(cov_type='HC0')\n",
    "cra_lasso_est = cra_lasso_results.params['T4']\n",
    "cra_lasso_se = cra_lasso_results.HC0_se['T4']\n",
    "print(f\"ATE: {cra_lasso_est:.4f}, (std={cra_lasso_se:.4f})\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Interactive Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "eWcw5TBncXYK",
    "papermill": {
     "duration": 1.29547,
     "end_time": "2021-02-20T13:42:45.045172",
     "exception": false,
     "start_time": "2021-02-20T13:42:43.749702",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "X.columns = [f'x{t}' for t in range(X.shape[1])]  # clean column names\n",
    "X = (X - X.mean(axis=0))  # demean all control covariates\n",
    "\n",
    "# construct interactions of treatment and (de-meaned covariates, 1)\n",
    "ira_formula = \"T4 * (\" + \"+\".join(X.columns) + \")\"\n",
    "X['T4'] = data['T4']\n",
    "X = patsy.dmatrix(ira_formula, X, return_type='dataframe')\n",
    "# drop treatment column\n",
    "X = X.drop('T4', axis=1)\n",
    "X = X.loc[:, X.std() > 0]  # drop constant columns\n",
    "\n",
    "Dres = D - np.mean(D)\n",
    "modely = lasso_model().fit(X, y)\n",
    "yres = y - modely.predict(X)\n",
    "\n",
    "\n",
    "ira_lasso_results = sm.OLS(yres, Dres).fit(cov_type='HC0')\n",
    "ira_lasso_est = ira_lasso_results.params['T4']\n",
    "\n",
    "# because we de-meaned the covariates and interacted them with the treamtent\n",
    "# we need to account for the error in the means of X, which has a first-order\n",
    "# effect on the ATE estimate\n",
    "interaction_cols = list(map(lambda x: x.startswith('T4:'), list(X.columns)))\n",
    "cate = X.iloc[:, interaction_cols] @ modely.coef_[interaction_cols]\n",
    "ira_lasso_se = np.sqrt(ira_lasso_results.HC0_se['T4']**2 + np.var(cate) / X.shape[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "TNAQW7KwcXYQ",
    "papermill": {
     "duration": 0.030965,
     "end_time": "2021-02-20T13:42:45.107753",
     "exception": false,
     "start_time": "2021-02-20T13:42:45.076788",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "# Overall Results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "RmPEWZXxcXYT",
    "papermill": {
     "duration": 0.113372,
     "end_time": "2021-02-20T13:42:45.304786",
     "exception": false,
     "start_time": "2021-02-20T13:42:45.191414",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "results = {\n",
    "    \"CL\": [cl_est, cl_se],\n",
    "    \"CRA\": [cra_est, cra_se],\n",
    "    \"IRA\": [ira_est, ira_se],\n",
    "    \"CRA w Lasso\": [cra_lasso_est, cra_lasso_se],\n",
    "    \"IRA w Lasso\": [ira_lasso_est, ira_lasso_se],\n",
    "}\n",
    "results = pd.DataFrame(results)\n",
    "results.index = [\"Estimate\", \"Standard Error\"]\n",
    "results\n",
    "# for printing to LaTeX: print(results.to_latex())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "tT1XCKLhcXYU",
    "papermill": {
     "duration": 0.019862,
     "end_time": "2021-02-20T13:42:45.344887",
     "exception": false,
     "start_time": "2021-02-20T13:42:45.325025",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "Treatment group 4 experiences an average decrease of about $7.8\\%$ in the length of unemployment spell.\n",
    "\n",
    "\n",
    "Observe that regression estimators delivers estimates that are slighly more efficient (lower standard errors) than the simple 2 mean estimator, but essentially all methods have very similar standard errors.  We also see the regression estimators offer slightly lower estimates -- these difference occur perhaps to due minor imbalance in the treatment allocation, which the regression estimators try to correct.\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "odb5eio_AUfI"
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.3"
  },
  "papermill": {
   "default_parameters": {},
   "duration": 7.178818,
   "end_time": "2021-02-20T13:42:45.473549",
   "environment_variables": {},
   "exception": null,
   "input_path": "__notebook__.ipynb",
   "output_path": "__notebook__.ipynb",
   "parameters": {},
   "start_time": "2021-02-20T13:42:38.294731",
   "version": "2.2.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
